Hadoop 1.x

Hadoop 1.x Architecture is a history now because in most of the Hadoop applications are using Hadoop 2.x Architecture. But still understanding of Hadoop 1.x Architecture will provide us the insights of how hadoop has evolved over the time. It is a Hadoop 1.x High-level Architecture.

hadoop1.x-components
  • Hadoop Common Module is a Hadoop Base API (A Jar file) for all Hadoop Components. All other components works on top of this module.
  • HDFS stands for Hadoop Distributed File System. It is also know as HDFS V1 as it is part of Hadoop 1.x. It is used as a Distributed Storage System in Hadoop Architecture.
  • MapReduce is a Batch Processing or Distributed Data Processing Module. It is built by following Google’s MapReduce Algorithm. It is also know as “MR V1” or “Classic MapReduce” as it is part of Hadoop 1.x.
  • Remaining all Hadoop Ecosystem components work on top of these two major components: HDFS and MapReduce.
Hadoop 1.x Major Components
Hadoop 1.x Major Components components are: HDFS and MapReduce. They are also know as “Two Pillars” of Hadoop 1.x.
HDFS -- HDFS is a Hadoop Distributed FileSystem, where our BigData is stored using Commodity Hardware. It is designed to work with Large DataSets with default block size is 64MB (We can change it as per our Project requirements).

HDFS component is again divided into two sub-components:

  1. Name Node -- Name Node is placed in Master Node. It used to store Meta Data about Data Nodes like “How many blocks are stored in Data Nodes, Which Data Nodes have data, Slave Node Details, Data Nodes locations, timestamps etc” .
  2. Data Node -- Data Nodes are places in Slave Nodes. It is used to store our Application Actual Data. It stores data in Data Slots of size 64MB by default.

MapReduce -- MapReduce is a Distributed Data Processing or Batch Processing Programming Model. Like HDFS, MapReduce component also uses Commodity Hardware to process “High Volume of Variety of Data at High Velocity Rate” in a reliable and fault-tolerant manner.

MapReduce component is again divided into two sub-components:

  1. Job Tracker -- Job Tracker is used to assign MapReduce Tasks to Task Trackers. Sometimes, it reassigns same tasks to other Task Trackers as previous Task Trackers are failed or shutdown scenarios. Job Tracker maintains all the Task Trackers status like Up/running, Failed, Recovered etc.
  2. Task Tracker -- Task Tracker executes the Tasks which are assigned by Job Tracker and sends the status of those tasks to Job Tracker.

hadoop1.x-hdfs-mr-components

How does the compents in Hadoop work collaboratively?

When client request the data from Hadoop System​
  • When Hadoop system receives the client request, it is first received by the Master node.​
  • Master node's MR component "Job Tracker" is responsible for receiving the client work and assigns the task to Task trackers, once divides the work into manageable independent task.​
  • Slave node's MR component "Task Tracker" receives the tasks from "Job Tracker" and performs the work using MR.​
  • Once all the Task trackers finished their work, JT takes those results and combines to produce the final result.​
  •  At last, Hadoop system sends the results back to clients.
hadoop1.x-components-architecture

Limitation of  Hadoop 1.0. 
  • No horizontal  scalability of Name Node.
  • Suppose, 10 Map and 10 Reduce Jobs are running with 10 + 10 Slots to perform a computation. All Map Jobs are doing their tasks but all Reduce jobs are idle. We cannot use these Idle jobs for other purpose.
  • It is only suitable for Batch Processing of Huge amount of Data, which is already in Hadoop System.
  • It is not suitable for Real-time Data Processing.
  • Only one NameNode is possible to configure i.e If NameNode fails entire cluster goes down, that is why NameNode is called as Single Point of Failure (SPOF). ​
  •  Metadata is stored in NameNode Memory(RAM), its supports 4000 node per Cluster.​
  •  Does not support NameNode High Availability.​
  •  Not have hot standby for the NameNode.​
  • JobTracker to perform many activities like Resource Management, Job Scheduling, Job Monitoring, Re-scheduling Jobs etc.​ 
  • Does not support Multi-tenancy

No comments:

Post a Comment